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IMPROVED ARCHITECTURE FOR GENERATING INTERMEDIATE 
REPRESENTATIONS FOR PROGRAM CODE CONVERSION 



The subject invention relates generally to the field 
of computers and computer software and, more particularly, 
to program code conversion methods and apparatus useful, 
for example, in code translators, emulators and 
accelerators . 

Across the embedded and non-embedded CPU market, one 
finds predominant Instruction Set Architectures (ISAs) for 
which large bodies of software exist that could be 
"Accelerated' 7 for performance, or "Translated" to a myriad 
of capable processors that could present better 
cost/performance benefits, provided that they could 
transparently access the relevant software. One also finds 
dominant CPU architectures that are locked in time to 
their ISA, and cannot evolve in performance or market 
reach and would benefit from "Synthetic CPU" 
co-architecture . 

It is often desired, to run program code written for a 
computer processor of a first type (a "subject" processor) 
on a processor of a second type (a "target" processor) . 
Here, an emulator or translator is used to perform program 
code' translation, such that the subject program is able to 
run on the -target processor-. _ The- emulator provides a 
virtual environment, as if- the -Subject program' were 
running natively on a subject .processor, by emulating the 
subject processor. ~ * _ * \ _ _ 

In the past, subject code is converted to an 
intermediate representation . of a computer program during 
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run- time translation using so-called base nodes, as 
described in WO 99/03168 entitled Program Code Conversion, 
in connection with Figures 1 through 5 of this 
application. Intermediate representation "IR" is a term 
5 widely used in the computer industry to refer to forms of. 
abstract computer language^ in which a program may be 
expressed, but which is not specific to, and is not 
intended to be directly executed on, any particular 
processor. Program code conversion methods and apparatus 
10 which facilitate such acceleration, translation and 
co- architecture capabilities utilizing intermediate 
representations are, for example, addressed in the above- 
mentioned publication WO 99/03168. 

15 According to the present invention there is provided 

an apparatus and method as set forth in the appended 
claims. Preferred features of the invention will be 
apparent from the dependent claims, and the description 
which follows. 

20 

The following is a summary of various aspects and 
advantages realizable according to various embodiments of 
the improved architecture for program code conversion 
according to the present invention. It is provided as an 
25 introduction to assist those skilled in the art to more 
rapidly assimilate the_ detailed discussion of the 
invention that _ensues and does not and is" not intended., in 
_■== any wayvto limit the scope of t-He ^claims that_ are^appended 
' " hereto . r " 

30- - - - - _ : " ' r ~ - 

The various embodiments described below relate to 
improved architectures" for a program' code conversion 
"apparatus and an associated method for converting subject 
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code executable in a subject computing environment to 
target code executable in a target computing environment. 
The program code conversion apparatus creates an 
intermediate representation ("IR") of the subject code 
which may then be optimized for the target computing 
environment in order to more efficiently generate the 
target code. Depending upon the particular architectures 
of the subject and target computing environments involved 
in the conversion, the program code conversion apparatus 
of one embodiment determines which of the following types 
of IR nodes to generate in the intermediate 
representation: base nodes, complex nodes, polymorphic 
nodes, and architecture-specific nodes. The program code 
conversion architecture will by default generate base 
15 nodes when creating the intermediate representation, 
unless it is determined that another one of the types of 
nodes would be more applicable to the particular 
conversion being effected. 



10 



20 



25 



Base nodes provide a minimal set of nodes (i.e., 
abstract expressions) needed to represent the semantics of 
any subject architecture running the subject code, such 
that, base nodes provide a RISC-like functionality. Complex 
nodes are generic nodes that represent CISC-like semantics 
of a subject architecture running the subject code in a 
more compact representation than, base nodes. While all 
complex nodes .could be - decomposed. into base node 
representations- With the -same -semantics , " -complex _ nodes 
preserve the semantics of complex instructions in a single 
3 0 IR node in order to improve the performance of the 
translator. Complex nodes essentially augment the- set of 
base nodes for CISC-like ' instructions in the subject code. 
Base nodes and complex nodes are both generically. used 
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' - over a wide range of possible subject and target 
architectures, thus allowing generic optimizations to be 
performed on the corresponding IR trees comprised of base 
nodes and complex nodes . 

5 

The program code conversion apparatus utilizes 
polymorphic nodes in the intermediate representation when 
the features of the target computing environment would 
cause the semantics of. the particular ■ subj ect instruction 

10 to be lost if realized as a generic IR node. The 
polymorphic nodes contain a function pointer to a function 
of the target computing environment specific to a 
particular subject instruction in the- source .code. The' 
program code conversion apparatus further utilizes 

15 architecture-specific nodes to provide target-specialized 
conversion components for performing specialized code 
generation functions for certain target computing 
environments. 

20 The improved IR generation methods hereafter described 

allow the program code conversion apparatus to be 
configurable to any subject and target processor 
architecture pairing while maintaining an optimal level of 
performance and .maximizing the speed of translation. 

25 

For a better understanding of. the^ invention, and to 
show how embodiments of the same may - be carried into 
.. effect, reference will- now b^" made, by-way of "example, to 
the accompanying diagrammatic drawings in which: 
30 ■ ~ " _ _ - . . _ " - 

Figure 1 shows an example computing environment 
including subject and target computing environments; 
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15 



20 



Figure 2 shows a preferred program code conversion 
apparatus ; 

Figure 3 is a schematic diagram of an illustrative 
computing environment illustrating translation of subject 
code to target code; 

Figure 4 is a schematic illustration of various 
intermediate representations realized by a program code 
conversion apparatus in accordance with a preferred 
embodiment of the present invention; 

Figure 5 is a detailed schematic diagram of a 
preferred program code conversion apparatus; 

Figure 6 shows example IR trees generated using base 
nodes and complex nodes; 

Figure 7 is a schematic diagram illustrating an 
example of ASN generation for implementation of ~ the 
present invention in an accelerator; 



Figure 8 is a schematic diagram illustrating an 
example of ASN generation for implementation of the 
25 present invention in a translator; 

- Figure 9 ^ is an. operational '. flow diagram. " of the 

translation process when utilizing., ASNsT in accordance with 
a preferred embodiment of the present invention; 
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Figure 10 is a schematic diagram illustrating an 
example of a translation • process and corresponding IR 
generated during the process; 
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Figure 11 is a schematic diagram illustrating another 
example of a translation process and corresponding IR 
generated during the process; and 

5 

Figure 12 is a schematic diagram illustrating a 
further example of a translation process and corresponding 
IR generated during the process. 

10 The following description is provided to enable any 

person skilled in the art to make and, use the invention 
and sets forth the best modes contemplated by the 
inventors of carrying out their invention.. Various 
modifications, however, will remain readily apparent to 

15 those skilled in the art, since the general principles of 
the present invention have been defined herein 
specifically to provide an improved architecture for a 
program code conversion apparatus. 

20 Referring to Figure 1, an example computing 

environment is shown including a subject computing 
environment 1 and a target computing environment 2. In the 
subject environment 1, subject code 10 is executable 
natively on a subject processor 12. The subject processor 

25 12 includes a set of subject registers 14. Here, the 
subject code 10 may be represented in any suitable 
language ^with intermediate layers " .(e.g.-, compilers) 
between the subject -code 10 : and the subject "processor ~12~, 
as will be familiar to a person skilled in the art. 

3 0__ " " _. _ - 

It is desired to run the subject code 10 in the target 
computing environment 2, which provides a target processor 
22 using a set of target registers 24. These two 
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processors 12 and 22 may be inherently non- compatible, 
such that these two processors use different instruction 
sets. Hence, a program code conversion architecture 30 is 
provided in the target computing environment 2, in order 
5 to run the subject code 10 in that non-compatible 
environment. The program code conversion architecture 3 0 
may comprise a translator, emulator, accelerator, or any 
other architecture suitable for converting program code 
designed for one processor type to program code executable 

10 on another processor type. For the purposes of the 
discussion of the present invention following hereafter, 
the program code conversion architecture 30 will be 
referred to as the "translator 30". It should be noted 
that the two processors 12 and 2 2 may also be of the same 

15 architecture type, such as in the case of an accelerator. 

The translator 3 0 performs a translation process on 
the subject code 10 and provides a translated target code 
20 for execution by the target processor 22 . Suitably, the 
translator 30 performs binary translation, wherein subject 
code 10 in the form of executable binary code appropriate 
to the subject processor 12 is, translated into executable 
binary code appropriate to the target processor 22. 
Translation can be performed statically or dynamically. In 
static translation, an entire program is translated prior 
to_ execution of the translated program on the target 
processor. This involves a significant, delay. Therefore, 
the translator 30 preferably dynamically translates small 
sections of" the subject code" 10 for execution immediately 
on the target processor 22 . " This is much more e-fficient, 
because large sections of the subject code 10 may not be 
used in practice or may be used only rarely. 
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Referring now to Figure 2, a preferred embodiment of 
the translator 30 is illustrated in. more detail, 
comprising a front end 3 1, a kernel 32 and a back end 33. 
The front end 31 is configured specific to the subject 
5 processor 12 associated with the subject code. The front 
end 3 1 takes a predetermined section of the subject code 
10 and provides a block of a generic intermediate 
representation (an XX IR block") . The kernel 32 optimizes 
each IR block generated by. the front end 31 by employing 
10 optimization techniques, as readily known to those skilled 
in the art. The back end 33 takes optimized IR blocks from 
the kernel 32 and produces target code 2 0 executable by 
the target processor 22. 

15 Suitably, the front end 31 divides the subject code 10 

into basic blocks, where each basic block is a sequential 
set of instructions between a first instruction at a 
unique entry point and a last instruction at a unique exit 
point (such as a jump, call or branch instruction) . The 

20 kernel 32 may select a group block comprising two or more 
basic blocks which are to be treated together as a single 
unit.- Further, the front end 31 may form iso-blocks 
representing the same basic block of subject code under 
different entry conditions. In use, a first predetermined 

25 section of the subject code 10 is identified, such as a 
basic _block, and is .translated by the translator 30- 
running-on the target processor 22 in a translation mode . 

r '~ The target processor 22- then- " executes the_ corresponding^ 
optimized and translated block of target code 20. 



3 0" 



The translator 3 0 includes" a plurality of abstract 
registers 34, suitably provided in the kernel 32, which 
represent the physical subject registers 14 that would be 
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used within the subject processor 12 to execute the 
subject code 10. The abstract registers 34 define the 
state of the subject processor 12 being emulated by 
representing the expected effects of the subject code 
5 instructions on the subject processor registers. 

A structure employing such an implementation is shown 
in Figure 3. As shown, compiled native subject code is 
shown residing in an appropriate computer memory storage 
medium 100, the particular and alternative memory storage 
mechanisms being well-known to those skilled in the art. 
The software components include native subject code to be 
translated, translator code, translated code, and an 
operating system. The translator code, i.e., the compiled 
version of the source code implementing the translator, is 
similarly resident on an appropriate computer memory 
storage medium 102. The translator runs in conjunction 
with the memory- stored operating system 104 such as, for 
example, UNIX running on the target processor 106, 
typically a microprocessor or other suitable computer. It 
will be appreciated that the structure illustrated in 
Figure 3 is exemplary only and that, for example, methods 
and processes according to the invention may be 
implemented in code residing with or beneath an operating 
25 system. The translated code is shown residing in an 
appropriate computer memory. storage -medium 108. The~ 
subject- code, translator -code, operating- system,, " 
^ translated code and storage Mechanisms 'may be "'any of a"" 
wide variety of types, as known" to those- skilled in the ' 
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In a preferred embodiment of the present invention, 
program code conversion is performed dynamically, at 



10 



run- time, while the translated program is running in the 
target computing environment. The translator 30 ^ runs 
inline with the translated program. The execution path of" 
the translated program is, a control loop comprising the 
5 steps of. executing translator code which translates a 
block of the subject code into translated code, and then 
, executing that block of translated code; the end of each 
block of translated code contains instructions to return 
control back to the translator code. In other words, the 
10 steps of translating and then executing the subject code 
are interlaced, such that only portions of the subject 
program are translated at a time. 

The translator 30' s fundamental unit of translation is 
15 the basic block, meaning that the translator 3 0 translates 
the subject code one basic block at a time. A basic block 
is formally defined as a section of code with exactly one 
entry point and exactly one exit point, which limits the 
block code to a single control path. For this reason, 
20 basic blocks are the fundamental unit of control flow,. 

Intermediate Representation (IR) Trees 

In the process of generating translated code, 
25 intermediate representation ("IR") trees are generated 
based on the subject instruction sequence-.- IR trees 
comprise .nodes that are abstract representations of- the 
expressions --calculated and operations performed by'ljthe 
subject program. The translated code is then- generated 
30 based _on the IR trees. The_ collections of - IR nodes- 
described herein are colloquially referred to as "trees" . 
We note that, formally, "such structures are in fact 
, directed acyclic graphs (DAGs) , not trees. The formal 
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definition of a tree requires that each node have at most 
one parent. Because the embodiments described use common 
subexpression elimination during IR generation/ nodes will 
often have multiple parents. For example, the IR of a 
flag-affecting instruction result may be referred to by 
two abstract registers, those corresponding to the 
destination subject register and the flag result 
parameter . 



For example, the subject instruction (add %rl, %r2 , 
%r3) performs the addition of the contents of subject 
registers %r2 and %r3 and stores the result in subject 
register %rl . Thus, this instruction corresponds to the 
abstract expression *%rl = %r2 + %3" . This example 
15 contains a definition of the abstract register %rl .with an 
add expression containing two subexpressions representing 
the instruction operands %rl and %r2 . In the context of a 
subject program, these subexpressions may correspond to 
other, prior subject instructions, or they may represent 
details of the current instruction such as immediate 
constant values. 



, When the "add" instruction is parsed, a new X Y' IR 
node is generated, corresponding to the abstract 
25 mathematical operator for addition. The *Y' IR node stores 
references _to other IR nodes that represent -the operands 
(held in subject registers, represented as subexpression 
-trees) . The _ *Y'- node is - itself referenced by" _ the 
appropriate . subject register definition (the abstract 
register for %rl, the instruction' s " destination. register) 
As those skilled in the art may appreciate, in one 
embodiment the translator is implemented using an 
object-oriented programming . language such as C++. For 
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example, an IR node is implemented as a C++ object, and 
references to' other nodes are implemented as C++ 
references to the C++ objects corresponding to those other 
nodes. An IR tree is' therefore implemented as a collection 
of IR node objects, containing various references to each 
other. 

Abstract Registers 

Further, in the embodiment under - discussion, IR 
generation uses a .set of abstract registers 34 . These 
abstract registers 34 correspond to specific features of 
the subject architecture. For example, there is a unique 
abstract register 34 for each physical register 14 on the 
subject architecture 12. Abstract registers 34 serve as 
placeholders for IR trees during IR generation. For 
example, the value of subject register %r2 at a given 
point in the subject instruction sequence is represented 
by a particular IR expression tree, which is associated 
with the abstract register 34 for subject register %r2 . In 
one embodiment, an abstract register 34 is implemented as 
a C++ object, which is associated with a particular IR 
tree, via a C++ reference to the root node object of that 
tree. 

In the example instruction sequence described- above, 
the translator -30 has already- generated IR 'trees- 
corresponding to v the values of %r2' and - %r_3 while parsing" 
the subject instructions that - precede the - "add" 
instruction^ In other" words, the subexpressions - that 
calculate the values of %r2 and %r3 are already 
represented as IR trees. When generating the IR tree for 
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the "add %rl, %r2 , %r3 instruction, the new X Y' node 
contains references to the IR subtrees for %r2 and %r3 . 

The implementation of the abstract registers 34 is 
divided between components in both the translator 3 0 and 
the translated code. In the context of the translator, an 
abstract register is a . placeholder used in the course of 
IR generation, such that the abstract register 34 is 
associated with the IR tree that calculates the value of 
the subject register 14 to which a particular abstract 
register 34 corresponds. As such, abstract registers 34 in 
the translator may be implemented as a C++ object which 
contains a reference to an IR node object (i.e.', an IR 
tree) . In the context of the translated code, an abstract 
register 34 is a specific location within the abstract 
register store, to and from which subject register 14 
values are synchronized with the actual target registers 
24. Alternatively, when a value has been loaded from the 
abstract register store, an abstract register 3 4 in. the 
translated code could be understood to be the target 
register 2 6 which temporarily holds a subject register 
value during the execution of the translated code, prior 
to being saved back to the register store. 

An example of program translation as described is 
illustrated in Fig. 4. Figure 4 shows the translation of 
two basic block of-x86 instructions, and the corresponding - 
IR_ trees _that are -generated in the process of translation." ' 
The left side of Figure 4 shows the execution path of the 
emulator during translation. -The translator 30- translates 
151 a first basic block of subject code 153 into target 
code and then executes 155 that target code. When the 
target code finishes execution, control is returned to the 
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emulator 157. The translator 30 then translates 157 the 
next basic block of subject code 159 into target code and 
executes 161 that target code, and so on. 

5 In the course of 'translating 151 the first basic block 

of subject code 153 into target code, the translator 30 
generates an IR tree 163 based on that basic block. In 
this case, the IR tree 163 is generated from the source 
instruction "add %eex, %edx, " which is a flag-affecting 

10 instruction. In the course of generating the IR tree 163, 
four abstract registers are defined by this instruction: 
the destination subject register %ecx 167, the first 
flag-affecting instruction parameter 169, the second 
flag-affecting instruction parameter 171, and the 

15 flag-affecting instruction' result 173. The, IR tree 
corresponding to the "add" instruction is simple a X Y' 
(arithmetic addition) operator 175, whose operands are the 
subject registers %ecx 177 and %edx 179. 

20 Emulation of the first basic block puts the flags in a 

pending state by storing the parameters and result of the 
flag-affecting instruction. The flag-affecting 

instruction is "add %ecx, %edx." The parameters of the 
instruction rare the current values of emulated subject 
25 registers %ecx 177 and %edx 179. The "@" symbol preceding 
the subject register _uses 177, 179 indicate that the 
values - of the subject -registers are retrieved from the 
global-., register store, :from the "_locat ions" corresponding to 
%ecx and %edx, respectively, as these particular subject 
-30~ registers were" not previously " loaded by the current-basic - 
block. These parameter values are then stored in the first 
169 and second 171 flag parameter abstract registers. The 
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result of the addition operation 175 is stored in the flag 
result abstract register 173. 

After the IR tree is generated, the corresponding 
5 target code is generated based on the IR. The process of 
generating target code from a generic IR is well 
understood in the art. Target code is inserted at the end 
of the translated block to save the abstract registers, 
including those for the flag result 173 and the flag 
10 parameters 169, 171, to the global register store. After 
the target code is generated, it is then executed 155. 

In the course of translating 157 the second basic 
block of subject code 159, the translator 3 0 generates an 

15 IR tree 165 based on that basic block. The IR tree 165 is 
generated from the source instruction "pushf , " which is a 
flag-using instruction. The semantics of the. "pushf 
instruction are to store the values of all condition flags 
onto the stack, which requires that each flag be 

20 explicitly calculated. As such, the abstract registers 
corresponding to four condition flag values are defined 
during IR generation: the zero flag ("ZF") 181, the sign 
flag ("S17") 183, the carry flag ("CF" ) 185, and the 
overflow flag ("OF") 187. Node 195 is the arithmetic 

25 comparison operator "unsigned less-than." The calculation 
of _ the condition flags is based on ^information f rom_ the 
prior flag-affecting- instruction, which in- this case is 
the- "add -%ecx, %edx"_. instruction from the first basic 
block 153 . The IR calculating the condition \ flag values 

30. 165 is based on the result 189 and parameters 19JL, 193 of 
the flag-affecting instruction. As above, the "@" symbol 
preceding the flag parameter labels indicates that * the 
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emulator inserts target code to load those values from the 
global register store prior to their use. 

Thus, the second basic block forces the flag values to 
5 be normalized. After the flag values are calculated and 
used (by the target code emulating the "pushf ' 
instruction) , they will be stored into the global register 
store. Simultaneously, the pending flag abstract registers 
(parameters and result) are put into an undefined state to 
10 reflect the fact that the flag values are stores 
explicitly (i.e., the flags have been normalized) . 

Figure 5 shows the translator 3 0 formed in accordance 
with a preferred embodiment of the present invention 

15 capable of generating several different types of IR nodes 
that may be used in translation as well as illustrating 
how the implementations of those different types of IR 
nodes are distributed between the frontend 31, kernel 32, 
and backend 33 components of the translator 30. The term 

20 "realize" refers to IR generation, which is performed in 
the frontend 31 as subject instructions of the subject 
code 10 are decoded (i.e., parsed). The term "plant" 
refers to target code generation, which is performed in 
the backend 33 . 

25 

Note that while the translation process is described 
_ below in terms _of a single subject instruction^ these 
- ^operations- actually take place for^an entire -basic Jblock 
of subject, instructions at - once as described above. In 
3 0 other words, the" entire basic block is initially decoded 
to generate an IR forest, then the kernel 32 applies 
optimizations to the whole IR forest. Lastly, the backend 
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33 performs target code generation for the optimized IR 
forest one node at a time. 

When generating an IR forest for a basic block, the 
translator 30 may generate either base nodes, complex 
nodes, polymorphic nodes, or architecture specific nodes 
(ASN) , or any combination thereof, depending upon the 
desired translator ' performance and the particular 
architectures of the source processor and target processor 
pairing. 

Base Nodes 

Base nodes are abstract representations of the 
semantics (i.e., the expressions, calculations, and 
operations) of any subject architecture and provide the 
minimal set of standard or basic nodes needed to represent 
the semantics of the subject architecture. As such, base 
nodes provide simple Reduced Instruction Set Computer 
(RISC) -like functionality, such as, for instance, an "add" 
operation. In contrast to other types of nodes, each base 
node is irreducible, meaning that it cannot be broken down 
any further into other IR nodes. Due to their simplicity, 
base nodes are also easily translated by the translator 30 
into target instructions on all backends 33 (i.e., target 
architectures) . _ - 

When _ utilizing only- base- "IR nodes, -the translation 
process takes place entirely at the top portion of Figure 
5- (i.e. , - paths traveling through "the- "Base _IR" block .204) . 
The front-end 3 1 decodes a subject instruction from the 
subject program code 10 in decode block 200, and realizes 
(generates) in realize block 202 a corresponding IR tree 
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made of base nodes. The IR tree is then passed from the 
front-end 31 to the Base IR block 204 in kernel 32, where 
optimizations are applied to an entire IR forest. As the 
IR forest optimized by the Base IR block 204 consists only 
of base nodes, it is entirely generic to any processor 
architecture. The optimized IR forest is then passed from 
the Base IR block 2 04 in the kernel 32 to the backend 33, 
which plants (generates). corresponding target code 
instructions for each IR node in Plant block 206. The 
target code instructions are then encoded by encode block 
208 for execution by the target processor. 

As noted above, base nodes are easily translated into 
target instructions on all backbends 33, and the 
translated code can typically be generated entirely 
through exclusive utilization of base nodes. While the 
exclusive use of base nodes is very quick to implement for 
the translator 30, it yields suboptimal performance in the 
translated code. In order to increase the performance of 
the translated code, the translator 30 can be specialized 
to exploit features of the target processor architecture 
by using alternative types of IR nodes, such as complex 
nodes, polymorphic nodes, and architecture-specific nodes 
(ASNs) . 

Complex Nodes. _ 

-'-Complex nodes -a r re generic- nodes that represent" the 
semantics of a subject architecture in a more compact 
representation than base- nodes Complex -nodes provide a 
"Complex Instruction Set Computer (CISC) -like" 
functionality such as "add_imm" (add register and 
immediate constant), for example. Specifically, complex 
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nodes typically represent instructions with immediate 
constant fields. Immediate- type instructions are 
instructions in which a constant operand value. is encoded 
into the instruction itself in an "immediate" field. For 
constant values that are small enough to fit into 
immediate fields, such instructions avoid the use of one 
register to hold the constant. For complex instructions, 
complex nodes can represent the semantics of the complex 
instructions with much fewer nodes than equivalent base 
node representations characterizing the same semantics. 
While complex nodes can essentially be decomposed into 
base node representations having the same semantics, 
complex nodes are useful in preserving the semantics of 
immediate -type instructions in a single IR node, thus 
improving the performance of- the translator 30. 
Furthermore, in some situations, the semantics of the 
complex instructions would be lost by representing the 
complex instructions in terms of base nodes, and complex 
nodes thus essentially augment the base node set to 
include IR nodes for such "CISC-like" instructions. 

With reference to Figure 6, an example of the 
efficiency achieved by using a complex node as compared to 
that of base nodes will now be described. For example, the 
semantics of the MIPS add-immediate instruction "add! 
rl, #10" adds^ ten to the value, held in register- rl . Rather 
bhan loading the constant value. (10)- into a register- and 
then adding -two registers, the addi instruction simply 
encodes the constant value 10 directly into the 
instruction field itself-,- thus avoiding the need- to use a - 
second register. .When generating an intermediate 
representation of these semantics strictly using base 
nodes, the base node representation for this instruction 
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would first load the constant value 10 from the const (#10) 
node 60 into a register node r(x) 61, and then perform an 
addition of the register node rl 62 and register node r (x) 
61 using add' node 63. The complex node representation 
5 consists of a single "add to immediate" IR node 70 
containing the constant value 10 at portion 72 of the node 
70 and a reference to register rl 74. In the base node 
scenario, the backend 33 would need to perform idiom 
recognition capable of recognizing a four node pattern, 
10 shown in Figure 6, in order to recognize and generate an 
«add to immediate" target instruction. In the absence of 
idiom recognition, the backend 33 would emit an extra 
instruction to load . the constant value 10 into a register 
prior to performing a register-register addition. 

Complex nodes reduce the need for idiom recognition in 
the backend 33, because complex nodes contain more 
semantic information than their base node equivalents. 
Specifically, complex nodes avoid the need for backend 33 
20 idiom recognition of constant operands. By comparison, if 
an immediate type subject instruction were decomposed into 
base nodes (and the target architecture also contained 
immediate type instructions), then the translator 30 would 
either need expensive backend 33 idiom recognition to 
25 identify the multiple node cluster as an immediate 
.instruction candidate, or generate, inefficient target code 
(i.e., more instructions, than necessary, using more target 
"registers than "necessary. In other words, by utilizing 
base nodes alone, performance is lost "either in the 
30 translator -30 (through idiom recognition) or _ the 
translated code (through extra generated code without 
idiom recognition) . More generally, because complex nodes 
are a more compact representation of semantic information, 
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they reduce the number of IR nodes that the translator 3 0 
must create, traverse, and delete. 

Immediate type instructions., are. common to many 
5 architectures. Therefore, complex nodes are generic in 
that, they are reusable across a range of architectures. 
However, not every complex node is present in the IR node 
set of every translator. Certain, generic features of the 
translator are configurable, meaning -that when a 
10 translator is being compiled for a particular pair of 
source and target architectures, features r * that do not 
apply to that translator configuration can be excluded 
. from compilation. For example, in a MIPS MIPS (MIPS to 
MIPS) translator, complex nodes that do not match the 
15 semantics of any MIPS instructions are excluded from the 
IR node set because they would never be utilized. 

Complex nodes can further improve the performance of 
the target code generated using an in order traversal. In 

20 order traversal is one of several alternative IR traversal 
algorithms that determines the order in which IR nodes 
within - an IR tree are generated into target code. 
Specifically, in order traversal generates each IR node as 
it is first traversed, which precludes backend 33 idiom 

25 recognition due to the absence of a separate optimization 
..pass over the entire IR tree.- Complex nodes represent more 
semantic information per node than .base- nodes, and thus 
-some of _ the work- of idiom recognition is ^implicit "within 
the complex nodes themselves. This allows the translator 

30 30 to use in order traversal- without suffering much- of a 
penalty in target code performance as it would with base 
nodes alone . 
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When the translator 30 generates complex nodes (i.e., 
the paths travelling through the Complex IR block 210 in 
Figure 5), the translation process is similar to the 
translation process described above for the base nodes. 
The only difference is that subject instructions that 
match the semantics of a complex node are realized as 
complex nodes in Realize block 202 rather than base nodes 
(as illustrated by the dotted line separating Real xze 
block 202). Complex nodes are still generic across a wxde 
3 range of architectures, which enables the kernel 32 
. optimizations to still apply to the entire IR forest. 
Furthermore, target code generation for complex nodes on 
CISC type target architectures may be more efficient than 
the base node equivalents. 

5 " 
Polymorphic Nodes 

A preferred embodiment of the translator 3 0 as 
illustrated in Figure 5 may further utilize polymorphic 
>0 - intermediate representation.' Polymorphic intermediate 
representation is a mechanism by which the backend 33 can 
provide specialized code generation to efficiently utilize 
target architecture features for specific, performance 
critical subject Instructions. The polymorphic mechanism 
25 is implemented' as a generic polymorphic node which 
contains a function, pointer to a_ backend 33 co_de 
generation function. Each function pointer is specialized 
to a particular subject instruction. This - polymorphic 
- mechanism preempts the standard frontend 31 IR generation 
30 - mechanism, ' which would otherwise decode the .subject 
instruction "into base or complex nodes. Without the 
polymorphic mechanism, the generation of those base nodes 
would, in the backend 33, either result in suboptimal 
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target code or require expensive idiom recognition to 
reconstruct the semantics of the subject instruction. 

Each polymorphic function is specific to a particular 
5 subject instruction and target architecture function 
pairing. Polymorphic nodes expose minimal information 
about their function to the kernel 32. Polymorphic nodes 
are able to take part in normal kernel 32 optimizations, 
such as expression sharing and expression folding. The 
10 kernel 32 can use the function pointer to determine if two 
polymorphic nodes are the same. Polymorphic nodes do not 
retain any semantic information of the subject 
instruction, but such semantic information can be inferred 
from the function pointer. 

15 

Polymorphic nodes are used for subject instructions, 
which can be expressed by a series of carefully chosen 
target instructions, removing the need for the kernel 32 
to determine the best . target instructions are run-time. 
20 When polymorphic nodes are not realized by the frontend 31 
which uses bases nodes, the kernel 32 may choose to 
realize these nodes as polymorphic nodes. 

Furthermore, polymorphic nodes can contain register 
25 allocation hints. As the target instructions are known, 
the respective registers that may Jbe required on CISC 
architectures may also be known. Polymorphic nodes allow 
their operand and results' -to appear in registers chosen at 
the time of IR construction. 
30 _ 

In order for the translator' 30 to utilize polymorphic 
nodes (i.e., the path traveling through polymorphic IR 
block 2-12 in Figure 5) , the backend 33 provides a list of 
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subject instruction target function pointer pairs to the 
frontend 31. Subject instructions that are on the provided 
list are realized as polymorphic nodes containing the 
corresponding backend 3 3 function pointer. Subject 
instructions that are not on the list are realized as 
complex or base IR trees as discussed above. In Figure 5, 
the path reflected by the arrow 214 from the backend 33 to 
the frontend 31 shows the list of subject instruction 
target function pointer pairs being provided to the 
realize block 215 at the frontend 3 1. While the frontend 
31 performs realization in the realize block 215 (i.e., 
mapping of subject instructions to IR nodes) , the process 
is modified by information received from the backend 33 
through path 214. 

In the polymorphic IR block 212 of the kernel 32, 
polymorphic nodes can still participate in generic 
optimizations, because the kernel 3 2 can infer their 
semantics from the function pointers in each node. In the 
backend 33, the target function pointers which point to 
target code generation functions are simply dereferenced 
and executed. This situation is different from the base' 
node and complex node cases where the backend 33 maps 
particular IR nodes to particular code generation 
functions. With polymorphic nodes, the polymorphic 
function is encoded directly in the node _itself , so that 
the backend 33- performs less computation.- In Figure. 5, 
this difference is shown by_ the fact that the polymorphic 
"plant block 216 is contiguous with both the polymorphic IR 
block 212 . and the backend 33 (i . e . , no arrows" designating 
nontrivial computations are shown between the polymorphic 
IR block 212 and the polymorphic plant block 216) . 



Example 1: Polymorphic IR Example 



To illustrate the process of optimizing the translator 
30 for utilizing polymorphic" nodes in the IR, the 
following example describes the translation of a PPC 
(PowerPC "SHL64" instruction (left shift, 64 bit) required 
in a PPC P4 (PowerPC to Pentium4) translator using first 
base nodes and then polymorphic nodes. 

Without optimizing the translator for the 
implementation of polymorphic nodes, the translation of 
the PPC SHL64 instruction would use only base nodes: 

PPC SHL64 => Base IR multiple nodes => P4 multiple 
instructions 

The frontend decoder 200 of an unoptimized translator 
decodes the current block and encounters the PPC SHL64 
instruction. Next, the frontend realize block 202 
instructs the kernel 32 to construct an IR consisting of 
multiple base nodes. Then the kernel 32 optimizes the IR 
forest (generated from the current block of instructions) 
and performs an ordering traversal to* determine the order 
of code generation in Base IR block 204. Next, the kernel 
32 performs code generation for each IR node in order, 
instructing the backend -33 to plant appropriate RISC type - 
instructions. Finally, the backend- 3 3 plants. code in plant 
block 206^ and- encodes each.. .RISC" type instruction' with one 
or more target architecture instructions in encode block 
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When optimized for a specific target architecture by 
specialization of the frontend 31 and backend 33 for 
performance critical instructions: 

PPC SHL64 > Poly IR single node > P4 single/few 
instructions 



The frontend decoder 200 of the optimized translator 
3 0 decodes the current block and encounters the PPC SHL64 
L0 instruction. Next, the frontend realize block 202 
instructs the kernel 32 to construct an IR consisting of a 
single polymorphic IR node. When the single polymorphic 
node is created, the backend 33 knows that the shift 
operand of SHL64 must be in a specific register (%ecx on 
15 P4 ) . This requirement is encoded in the polymorphic node. 
Then the kernel 32 optimizes the IR forest for current 
block and performs an ordering traversal to fix the code 
generation order in the. polymorphic IR block 212. Next, 
the kernel 32 performs code generation for each node, 
instructing the backend 33 to plant appropriate RISC type 
instructions. During code generation, however, polymorphic 
nodes are treated differently than base nodes. Each 
polymorphic node causes the invocation of a specialized 
code generator function which resides in the backend 33. 
The backend 33 specialized code generator . function plants 
code in plant block 216 _ and encodes each subject 
architecture'- instruction with one or more target 
architecture - instructions in . encode, block 208. During 
register allocation "in the generat ion" phase , the specific 
register information is used to allocate the correct 
register. This reduces the computation performed by "the 
backend 33 which would be required if " unsuitable registers 
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had been allocated. This code generation may involve 
register allocation for temporary registers';' 

Example 2: Difficult Instructions 

5 - 

The following example illustrates the translation and 
optimization of the PPC MFFS instruction (move 32 bit FPU 
control register to 64 bit general FPU register) which 
would be performed by the translator 3 0 of the* present 
10 invention. This subject instruction is too complex to be 
represented by base nodes. 

In the unoptimized case, this instruction would be 
translated using a substitute function. Substitute 

15 functions are explicit translations for special cases of 
subject instructions that are particularly difficult to 
translate using the standard translation scheme. 
Substitute function translations are implemented as target 
code functions that perform the semantics of the subject 

20 instruction. They incur a much higher execution cost than 
the standard IR instruction based translation scheme. The 
unoptimized translation scheme for this instruction is 
thus : 

25 PPC MFFS instruction => Base IR substitute function => 

P4 substitute function 

" ~_ In a translator 30 using -polymorphic IR, such special' 
case instructions are' translated using a polymorphic node. 
30 The polymorphic node ' s - function pointer- provides a more- 
efficient mechanism for the backend 3 3 to supply a custom 
translation of the difficult subject instruction. The 
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optimized translation scheme for the same instruction is 
thus : 

PPC MFFS instruction => single Polymorphic -IR node => 
5 P4 SSE2 instructions 

Architecture Specific Nodes 

in another preferred embodiment of the translator 3 0 
of the present invention, the translator 30 may utilize 
architecture specific nodes (ASNs), as shown in Figure 5, 
which 'are specific to particular architectures (i.e., a 
particular source architecture target architecture 
combination) . Each architecture specific node (ASN) is 
specifically tailored to a particular instruction, thus 
rendering ASNs specific to particular architectures. When 
utilizing the ASN mechanism, architecture specific 
optimizations can be implemented which comprehend the 
ASNs' semantics and can therefore operate on the ASNs. 
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IR nodes may contain up to three components : a data 
component, an implementation component, and a conversion 
component. The data component holds any semantic 
information which is not inherent in the node -itself 
(e.g., the value of a constant immediate instruction 
field). The implementation component performs code 
generation, and,- therefore, is specifically related to a 
-particular architecture. The conversion component "converts 
the node into IR nodes" of a different type, either ASN 
30 nodes or ±ase nodes. In a given implementation of the 
present invention in a translator, each base node and ASN 
in the generated IR contains either a conversion component 
or an implementation component, but not both. 



Each base node has an implementation component which 
is specific to the target architecture. Base nodes do not 
have conversion components, because base nodes encode the 
least possible amount of semantic information in the IR 
node hierarchy, thus converting base nodes into other 
types of IR nodes would not provide any benefit. Any such 
conversion of base nodes into other types of IR nodes 
would require the recollection of semantic information 
through idiom recognition. 

The implementation component of an ASN is specific to 
the node's architecture, such that it, generates an 
architecture specific instruction corresponding to that 
ASK For example, the implementation component of a 
MIPSLoad ASN generates a MIPS w 2d' (load) instruction. 
When using the translator of the present invention with 
the same subject and target architectures (i.e., as an 
accelerator), subject ASNs will possess, implementation 
components. When utilizing the translator with different 
subject and target architectures, subject ASNs will have 
conversion components. 

For example, Figure 7 illustrates the ASN for a MIPS 
instruction when using an embodiment of the present 
invention in a -MIPS-MIPS accelerator. The front-end 31 
decodes the MIPS "addi" (add immediate) - instruction- 701 ■ 
and generates an- IR to include ' 'the corresponding -ASN; * 
MIPS_ADDI 703. The subject and target architectures are 
the same for an - accelerator, and- thus the- conversion 
component "CVT" 707 is undefined. The implementation 
component "IMPL 705 is defined to generate the same MIPS 
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«addi" instruction 709, subject to register allocation 
differences in the code generation pass. 

Figure 8 illustrates the ASNs in the IR for the same 
5. MIPS instruction when using an embodiment of the present 
invention in a MIPS X86 translator. The frontend 31 
decodes the MIPS "addi" subject instruction and generates 
a corresponding subject ASN, MIPS ADDI 801. The source and 
target architectures are different for this translator, 
10 and the implementation component 803 of the subject ASN 
801 is thus undefined. The conversion component 805 of the 
MIPS ADDI is a specialized conversion component,, which 
converts the subject ASN 801 into a target ASN 807. By 
comparison, a generic conversion component would convert 
15 the subject ASN 8 01 into a base node representation. The 
target ASN representation of the MIPS ADDI node 801 is a 
single X86 ADDI node 807. The conversion component 811 of 
the target ASN 807 is undefined. The implementation 
component 809 of the target ASN 807 generates the a target 
20 instruction 813, in this case the X86 instruction "ADD 
$EAX, #10." 

When the translator 30 is utilizing ASNs, all subject 
instructions are realized as subject specific ASNs. In 

25 Figure 5, the fact that the frontend decode block 2 00, the 
ASN realize block 218, and the subject ASN block 220 are 
contiguous with each other represents .the fact that the 
ASNs _ are defined by the frontend 31 and that realization 
is trivial, "because there is" a one to one relationship 

30 _ between subject .instruction type s__ and subject ASNs. types. 
The frontend 31 contains subject specific optimizations 
which understand the semantics of, and can operate on, 
subject ASNs . In other words, the subject code is 



initially realized as an IR forest consisting entirely of 
subject ASNs, to which subject specific optimizations are 
then applied. 

By default, a subject ASN has a generic conversion 
component which generates an IR tree of base nodes. This 
allows support for a new subject architecture to be 
implemented quickly using generic IR nodes. Subject ASNs 
are realized as base nodes through the path extending 
through the ASN Base IR block 222 and plant block 2 06 in 
Figure 5, which are translated into target code in a 
similar manner to other base nodes as described in detail 
above . 

For subject instructions that are significant to 
performance, the corresponding subject ASN nodes provide 
specialized conversion components, which generate IR trees 
of target ASN nodes . Factors considered in whether to 
implement a specialized conversion component include (1) 
whether the target architectural features provide for 
particularly efficient translation that would be lost in a 
base node translation and (2) whether a subject 
instruction occurs with such frequency that it has a 
significant impact on performance. These specialized 
conversion components are specific to the subject target 
architecture pair. Target ASNs (which by definition have 
the same , architecture as the target) include 
implementation components. 

When implementing the specialized conversion 
components, the corresponding subject ASN nodes provide 
target specialized conversion components which convert the 
subject ASNs into target ASNs through the target ASN block 
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224. The target ASN' s implementation component is then 
invoked to perform code generation in the target ASN plant 
block 226. Each target ASN corresponds to one particular 
target instruction, such that the code generated from a 
5 target ASN is simply the corresponding target instruction 
that the ASN encodes. As such, code generation using 
target ASNs is computationally minimal (represented in 
reflected in Figure 5 by the illustration of the target 
ASN plant block 226 being contiguous with both the target 
10 ASN block 224 and the encode block 208 in the backend 33, 
with no arrows designating nontrivial computations being 
shown between these components). Furthermore, the IR 
traversal, conversion, and code generation processes are 
all controlled by the kernel 32. 
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Figure 9 illustrates the translation process performed 
accordance with a preferred embodiment of the 
translator of the present invention that utilizes the ASN 
mechanism. In the frontend 31, the translator decodes the 
subject code 901 in step 903 into subject ASNs .904. The 
translator performs subject, specific optimizations in step 
905 on the IR tree made up of subject ASNs. Each subject 
ASN 904 is then converted in step 907 into target 
compatible IR nodes (target ASNs 911) by invoking the 
subject ASN's conversion component. Subject ASN nodes 
which have generic conversion components by default are 
converted into base nodes 909. Subject ASN nodes which 
have specialized conversion components, as provided by the 
backend .925, are converted into target ASNs 911'. The 
conversion thus produces a mixed IR forest 913, containing 
both base nodes 909 and target ASNs 911. In the kernel 32, 
the translator performs generic optimizations in step 915 
on the base nodes in mixed IR forest 913. The translator 



then performs target specific optimizations in step 916 on 
the target ASNs in the mixed IR forest 913. Finally, code 
generation invokes the implementation component of each 
node in the mixed tree (both base nodes and target ASN 
nodes have implementation components) in step 917, which 
then generates target code 919. 

In the special case of a code accelerator, the subject 
and target architectures are both the same. In this 
scenario, subject ASNs persist throughout translation. In 
the frontend 31, decoding generates subject ASNs from the 
subject instructions. In the kernel 32, the subject ASNs 
are passed through architecture specific optimizations. 
Code generation invokes the subject ASNs' implementation 
.components to generate the corresponding instructions. As 
such, in a code accelerator the use of ASNs prevents code 
explosion, by ensuring a minimum subject to target 
instruction conversion ratio of 1:1, which can be 
increased by optimizations. 

The various embodiments of the translator of the 
present invention can be configured for specific 
translator applications (i.e., particular subject 
architecture target architecture pairs) . As such, the 
translator of the present invention is configurable to 
convert subject code designed to run on any subject 
architecture to target code executable on any target 
architecture. Across multiple translator applications, 
each base node has multiple implementation components, one 
for each supported target architecture. The particular 
configuration being undertaken (i.e., conditional 
compilation) determines which IR nodes and . which 



components of those nodes to include in a particular 
translator application. 

The use of ASNs in a preferred embodiment of the 
present invention provides a plurality of advantageous 
benefits- First, a translator product built from scratch 
can be developed quickly using generic IR implementations 
of subject instructions. Second, existing translator 
products can be incrementally augmented, by implementing 
target specific conversion components for subject 
instructions that are critical to performance (as known 
beforehand or as empirically determined). Third, as more 
translator products are developed, the library of ASN 
nodes (and implemented functionality) grows over time, so 
future translator products can be implemented or optimized 
quickly. 

This embodiment of the present invention backend 
implementations to pick and choose which subject 
instructions are worth optimizing (by defining target- 
specialized conversion components) . The generic conversion 
component allows an ASN-based translator to be developed 
quickly, while the specialized conversion components 
allows performance critical instructions to be selectively 
and incrementally optimized. 

Example 3: Difficult Instructions Using ASN 

Returning to the PowerPC SHL64 instruction of Example 
2 above, the translator 30 using ASNs performs the 
following steps. The frontend decoder 200 decodes the 
current block and encounters the PowerPC SHL64 
instruction. The frontend 31 then realizes a single ASN 



for that instruction, SHL64 PPC P4 . The kernel 32 then 
optimizes the IR for the current block of instructions and 
performs an ordering traversal of the IR in preparation 
for code generation. The kernel 32 then performs code 
generation for the ASN nodes by invoking each particular 
ASN node's code generator function, which is an element of 
the implementation component. The backend 33 then encodes 
subject architecture (PPC) instructions into one or more 
target architecture (P4) instructions. 

MIPS Examples 

Referring now to Figures 10, 11 and 12, examples 
illustrating the different IR trees that are generated 
from the same MIPS instruction sequence using base IR 
nodes, MIPS-MIPS ASN IR nodes, and MIPS-X86 ASN IR nodes, 
respectively, are shown. The semantics of the example MIPS 
subject instruction sequence (load upper immediate, then 
bitwise-or immediate) is to load the 32 bit constant value 
0x12345678 into subject register "al 

In Figure 10, the Binary Decoder 3 00 is a frontend 31 
component of the translator 3 0 which decodes (parses) the 
subject code into individual subject instructions. After 
the subject instructions are decoded, they are realized as 
base nodes 3 02 and added to the working IR forest for the 
current block of instructions. The IR Manager 3 04 is the 
portion of the translator 3 0 that holds the working IR 
forest during IR generation. The IR Manager 304 consists 
of abstract registers and their associated IR trees (the 
roots of the IR forest are abstract registers) . For 
example, in Figure 10, the abstract register w a V 3 06 is 
the root of an IR tree 3 08 of five nodes, which is part of 
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the current block's working IR forest. In a translator 30 
implemented in C++, the IR Manager. 3 04 may be implemented 
as a C++ object that includes a set of abstract register 
objects (or references to IR node objects) . 

5 

Figure 10 illustrates an IR tree 308 generated by a 
MIPS to X86 translator using base nodes only. The MIPS-. 
_LUI instruction 3 10 realizes a "SHL" (shift left) base 
node 314 with two operand nodes 316 and 318, which in this 

10 case are both constants . The semantics of the MIPS_LUI 
instruction 310 are to shift a constant value (Oxl234) 
left by a constant ■ number of bits (16). The MIPS_ORI 
instruction 312 realizes an "ORI" (bitwise_or immediate) 
base node 32 0 with two operand nodes 314 and 3 22, the 

15 result of the SHL node 314 and a constant value. The 
semantics of the MIPS_ORI instruction 312 are to perform a 
bitwise-or of the existing register contents with a 
constant value (0x5678) . 

20 in an unoptimized code generator, the base nodes 

include no immediate-type operators other than load 
immediate, so each constant node results in the generation 
of a load immediate instruction. The unoptimized base node 
translator therefore requires five RISC type operations 

25 (load, load, shift, load, or) for this subject 
instructions sequence. Backend 33 idiom recognition can 
reduce this number from five to two, by coalescing the 
constant nodes with their . parent nodes, to generate 
immediate type target instructions (i.e., shift immediate 

30 and or immediate) . This reduces the number of target 
instructions to. two, but for an increased translation cost 
in performing the idiom recognition in the code generator. 



Using complex nodes in the IR can realize immediate 
type IR nodes , which eliminates the need to perform idiom 
recognition in the backend 33 and reduces the translation 
cost of the code generator. Complex nodes preserve more of 
the semantics of the original subject instructions, and, 
with fewer IR nodes being realized, the translation cost 
of node generation is also reduced when using complex 
nodes . 

Figure 11 illustrates the IR tree generated by a MIPS 
X86 (MIPS to X86) translator using ASNs . After the subject 
instructions are decoded by the binary decoder 300, they 
are realized as MIPS_X86 ASN nodes 330, which are then 
added to the working IR forest for the current block. 
First, the MlPS_X8 6_LUI ASN node is converted into an X8 6 
32-bit constant node 332 by the ASN's convert component. 
Second, the MIPS_X8 6_ORI ASN node produces an X8 6 ORI node 
which is immediately folded with the previous X86 constant 
node (constant folding), resulting in a single X86 32-bit 
constant node 334. This node 334 is encoded into a single 
X86 load constant instruction, "mov %eax, $0x12345678" . As 
can be seen, ASN nodes result in fewer nodes than the base 
node example, thus reducing translation cost and providing 
better target code. 

-Figure 12 illustrates an IR tree generated by a MIPS- 
_MIP| "translator _(i.e.~, - a MIPS accelerator) ~ using. ASNs.- 
After -the "subject instructions. 310, 312" are decoded by the- 
binary decoder 3 00, they are realized as MIPSJYIIPS ASN 
nodes- 340-, which are t-hen added "to the -working IR forest 
for the current block. Because the source and target 
architectures are the same for the MIPS-MIPS translator, 
the MIPS_MIPS_LUI and MIPS JVIIPS_ORI ASN nodes 340 have 
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null (undefined) convert components. As such, there is a 
direct correspondence between the subject instructions and 
the final IR nodes used to generate code. This guarantees 
a 1:1 subject to target instruction translation ratio, 
5 even before any optimizations are applied. In other words, 
ASN nodes eliminate code explosion for same -same 
translators (accelerators) . ASN nodes also allow 16 bit 
constant nodes to be shared, which, is useful for efficient 
translation of contiguous memory accesses. "on. the MIPS 
10 platform. 

Basic blocks c?f instructions are translated " one 
subject instruction at a time. Each subject instruction 
results in the formation of (realizes) an IR tree. After 

15 the IR tree for a given instruction is created, it is then 
integrated into the working IR forest for the current, 
block. The roots of the working IR forest are abstract 
registers, which correspond to the subject registers and 
other features of the subject architecture. When the last 

20 subject instruction has been decoded, realized, and its IR 
tree integrated with the working IR .forest, the IR forest 
for that block is complete- 
In Figure 12, the first subject instruction 310 is 

25 u lui al, OxI234" . The semantics of this instruction 310 
.are to load the- constant value OxI234 .into the upper 16 
bits of subject \_ register w ai:' 342\ - This instruction- 310 
~ -realizes, a MIPS_MIPS_LUf Inode 344 7"with ~an_ immediate' field 
constant- value of OxI234. The translator adds this node to 

30 the working- IR forest by setting abstrajct register- "al" 
"342 (the destination register of the subject instruction) 
to point to the MIPSJVIIPS__LUI IR node 344. : 



.^^mm^m*^^., 
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In the same example in Figure 12, the second subject 
instruction 312 is 44ori al , al, 0x5678". The semantics of 
this instruction 312 are .to perform a bitwise-or of the 
constant value Ox5678 with the current contents of subject 
register 4'al"- 342 and to store the result in subject 
register 4'al" 346. This instruction 312 realizes a 
MIPS MIPS_0RI node 348, with an immediate field constant 
vaTue^'of 0x5678. The translator adds this node Jpb' the 
working IR forest by first * setting the C^Thpde to point 



10 to the IR tree that is" currently poirxe^td" by abstract 
register "al" 342 (the source register" of the subject 
instruction) , and then setting tfn'e abstract register "al" 
346 (the destination, register of the subject instruction) 
to point to the ORI node- 34 8 -Ir.'. other words, the existing 

15 "al" tree rooted with abstract register 342 (i.e., the LUI 
node) becomes a subtree 350 of the ORI node 348, and then 
ths ORI node 34 8 beech's the new al tree. The old "al" 
tree (alter LUI but before ORI) is rooted from abstract 
register 342 and snown as linked by line 345, while the 

20 current "al" tree (after ORI) is rooted from abstract 
register 34 6. /- ; " 

*■•-■; -7 

As can be seen from the foregoing, an improved program 
code conversion apparatus formed in accordance with the 

25 present invention is configurable to any subject and 
-target processor -architecture pairing while maintaining an 
optimal- level of performance and . balancing the speed of 
translation with the efficiency or the "translated ta-rget 
code. Moreover, - depending upon the particular 

30 architectures of - the subj-ect and target- computing 
environments involved in the conversion, the program code 
conversion apparatus of the present invention, can be 
designed with a hybrid design of generic and specific 
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conversion features by utilizing a combination of base 
nodes, complex nodes, polymorphic nodes, and architecture 
specific nodes in its intermediate representation. 

5 The different structures of the improved program code 

conversion apparatus of the present invention are 
described separately in each of the above embodiments. 
However, it. is the full intention of tthe inventors o£ cne 
• -.present: indention that the separate aspects of each 

10 embodiment desc^rib'ed herein may be combined with the other 
embodiments described herein. For instance, tn^ translator 
formed in accordancv^ with the present invention may ; 
comprise .hybrid optimizations cf- ^variouG' IR types . Those 
skilled in the art will appreciate that various 

15 adaptations and modifications of the just ~ described 
preferred embodiment can be configured without departing 

from the. scope and spirit of the invention. Therefore, it 

Sn 

is~ to be understood that, within the scope of the., appended 

claims, the invention raay be practiced ' other than as 

2 0 specif ically described herein . 

.Although a few preferred embodiments have been shown 
and described, it will, be appreciated by those skilled in 
the art that various changes and modifications might be 
25 made without departing from the. scope of the invention, as 
defined in the- appended claims.- -.- ~ 

_ " Attention "is_ directed to "all papers " and ' documents 
- which are filed concurrently with or previous to this 
30 _ specification in connection with this application and 
which are open to public inspection with this 
specification, and the contents of all such papers and 
documents are incorporated herein by reference. 
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All of the features disclosed in this specification 
(including any accompanying claims, abstract and 
drawings), and/or all of the steps of any method or 
5 process so disclosed, may be combined in any combination, 
except combinations where at least some of such features 
and/or steps are mutually exclusive. 

Each feature disclosed in this ' specification 
10 (including any accompanying claims, abstract and drawings) 
may be replaced by alternative features serving the same, 
equivalent or similar purpose, .unless expressly stated" 
otherwise. Thus,, unless expressly stated otherwise, each 
feature disclosed is one example only of a generic series 
15 of equivalent or similar features. 

The invention is not restricted to the details of the 

foregoing embodiment (s ) . The invention extends to any 
novel one, or- any - novel combination, of the features 

20 disclosed in this specification (including any 

accompanying claims, abstract and drawings) , or to any 

novel one, or any novel combination, of the steps of any 
method or process so disclosed. 

25 
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CLAIMS 

1. a method of generating an intermediate 
representation of program code, comprising the steps of: 

5 decoding instructions in the program code; 

generating an intermediate representation (IR) of the 
decoded program code to include at least one type of IR 
." nodes out of a plurality of possible types of IR nodes; 
10 and 

determining which type of IR nodes to generate in the 
intermediate representation for each respective 
instruction in the decoded program code, wherein the IR 
15 nodes in the intermediate representation (IR) are abstract 
representations of the expressions, calculations, and 
operations performed by the program code. 

2. The method of claim 1, wherein the plurality of 
20 possible types of IR nodes include base nodes and complex 

nodes . 

3. The method of claim 2, wherein' base nodes 
represent the most basic semantics of any subject 

25 architecture running the program code, such that the 
semantics of -base nodes cannot be decomposed into other 
nodes representing more simple semantics. _ _._ 

4. The method of claim 3, wherein base nodes are 
30- generic - across a plurality- of possible - subject 

architectures . 
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5. The method of claim 3 or 4 , wherein complex nodes 
provide a more compact representation of the semantics "of 
complex instructions in the program code than that of base 
node representations. 

5 

6. The method of claim 5, wherein complex nodes 
represent immediate type instructions in which a constant 
operand value is encoded into the immediate type 
instruction itself in an immediate field. 

10 

7. The method of claim 5 or 6 , wherein a complex node 
may be decomposed into a plurality of base nodes to 
represent the same semantics of an instruction in the 
decoded program code . 

15 

8. The method of claim 5, 6 or 7 7 wherein the program 
code is designed to be executed by a subject architecture, 
the method further comprising the step of generating 
complex nodes only for those features correspondingly 

20 configurable on the subject architecture. 

9. The method of claim 2 or any claim dependent 
thereon, wherein the plurality of possible types of IR 
nodes further include polymorphic nodes. 

25 

10. The- method of claim -9 , wherein the program code is 
subject code. designed^ for - execution^ on " a subject 

"architecture ,.and~ "is dynamically "translated " "into "target 
code for execution on a target architecture , said method 
.30 further comprising: - - - - 

generating the intermediate representation to include 
■ polymorphic nodes, wherein polymorphic nodes contain a 
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function pointer to a function of the target architecture 
specific to a particular instruction in the subject code. 

11. The method of claim 10, said method further 

5 comprising generating polymorphic nodes when the features 
of the target architecture would cause the semantics of a 
particular subject instruction to be lost ' if realized as 
base nodes . 

10 12. The method of claim 10 or -.11, wherein each, 

polymorphic node is specific to a combination of a 
particular instruction in the subject code and a function 
of the target architecture. 

15 13. The method of claim 10, 11 or 12, wherein said 

determining the type of IR nodes step further comprises 
identifying an instruction in subject code which 
corresponds- an instruction on a list of polymorphic 
instructions to be realized as polymorphic nodes; and 

20 

when a subject instruction corresponds to an 
instruction on the list of polymorphic instructions , said 
IR generating step generates polymorphic nodes only for 
those subject instructions corresponding to .those on the 
25 list of polymorphic instructions. 

14. -. The _ method of ~ any preceding claim, .wherein the 
~ plurality of possible' "types of IR - nodes further include 

base nodes and architecture specific nodes. 

15. The method of claim 14, wherein the program code 
is subject code designed for execution on a subject 
architecture and is dynamically translated- into target 
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code for execution on a target architecture, said method 
further comprising: 

generating the intermediate representation to include 
5 architecture specific nodes which are specific to a 
particular combination of a subject architecture and a 
target architecture. 

16. The' method of claim 15, the intermediate 

10 representation generating step further comprising: 

initially representing all of the instructions in- the 
subject code as subject architecture specific nodes, where 
each subject architecture specific node corresponds . to a 
15 respective instruction in the subject code; 

determining whether an instruction in the subject code 
is one in . which' to provide a target architecture 
specialized conversion function, converting subject 
20 architecture specific nodes into target architecture 
specific nodes for those instructions determined to 
provide a target architecture specialized conversion 
function; and 

25 generating base nodes from the remaining subject 

architecture specific nodes whrch are not identified as 
providing a target" architecture "specialized code 
generation function. - - 

30 -17. Tire method of claim 16, further" comprising, 

generating .corresponding target code from" the target 
architecture specific nodes which is specialized for the 
target architecture . 
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18.. The method of claim 15/ 16 or 17, further 

comprising generating ; corresponding- target code from the 
base nodes which is not "specialized f or the target 
5 architecture. 

19. a computer readable recording medium containing 
program code for performing the method of any preceding 
claim. 

10 

20. A computer readable storage medium having 
translator software resident thereon in the form of 
computer readable code executable by a computer to perform 
the following steps during translation of subject program 

15 code to target program code: 

' decoding instructions in the subject program code; 

: generating an intermediate representation (IR) of the 

20 decoded subject program code to include at least one type 
of IR nodes out of a plurality of possible types of IR 
nodes; 

determining which type of IR nodes to generate in the 
25 intermediate representation for each " respective 
instruction in the decoded . sub j ect program code, wherein 
the IR - nodes in the .intermediate representation (IR) are 
abstract representations- of "the expressions, calculations, 
and operations performed by the program code; and 

30 ~ - . - ' - _ ~ "* - - ■ 

generating target program code using the intermediate 

representation (IR) . 
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21. The computer readable storage medium of claim 20, 

wherein the plurality of possible types of. IR nodes 
include base nodes and complex nodes. 

5 22. The computer readable storage medium of claim 21, 

wherein base nodes represent the most basic semantics of 
any subject architecture running the program code, such 
that the semantics of base nodes cannot be decomposed into 
other nodes representing more simple semantics. 

10 

23. The computer readable storage medium of claim 22, 

wherein base nodes are generic across a plurality of 
possible subject architectures. 

1.5 24. The computer readable storage medium of claim 22, 

wherein complex- nodes provide a more compact 
representation of the semantics of complex instructions in 
the program code than that of base node representations. 

20 25. The computer readable storage medium of claim 24,- 

wherein complex nodes represent immediate type 
instructions in which a constant operand value is encoded 
into the immediate type instruction itself in an immediate 
field. 

25 

26-. The computer readable storage medium of claim- 24, 

wherein a- complex node may be decomposed into a plurality 
of * base" nodes to- 'represent the same semantics of" an 
instruction in the, decoded program code. 
3-0 . - - - - - - 

27. The computer readable storage medium of claim 24, 

wherein the subject program code is designed to be 
executed by a subject architecture, the method further 
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comprising the step of generating complex nodes only for 
those features correspondingly configurable on the subject 
architecture. 

5 28. The computer readable storage medium of any of 

claims 21 to 27, wherein the, plurality of possible types 
of IR nodes further include polymorphic nodes. 

29. The computer readable storage medium of claim 28, 
10 wherein the subject program code is designed for execution 

on a subject architecture and is dynamically translated 
into target code for execution on a target architecture, 
said translator software further containing computer 
readable code executable by a computer to' perform the 
15 following steps: 

generating the intermediate representation to include 
polymorphic nodes, wherein polymorphic nodes contain a 
function pointer to a function of the target architecture 
20 specific to a particular instruction in the subject code. 

30. The computer readable storage medium of claim 29, 
said translator software further containing computer 
readable code executable by a computer to generate 

2 5 polymorphic nodes when the features of the target 
architecture would cause the semantics -of a particular 
subject, instruction to be lost if realized as base nodes. 

31. The computer readable storage medium of claim 29, 
30 wherein each polymorphic node is specific to a combination 

of a particular instruction in the subject code and a 
function of the target architecture. 
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32. The computer readable storage medium of claim 29, 
wherein said computer readable code executable by a 
computer for determining the type of IR nodes further: 

identifies an instruction in subject code which 
corresponds- an instruction on a list of polymorphic 
instructions to be realized as polymorphic nodes; and 

when a subject instruction corresponds to an 
instruction on the list of polymorphic instructions, 
generates polymorphic nodes - only for those subject 
instructions corresponding to those on the list of 
polymorphic instructions . 

33. The computer readable storage medium of any of 
claims 20 to 32, wherein the plurality of possible types 
of IR nodes further include base nodes and architecture 
specific nodes.. 

34. The computer readable storage medium of claim 33, 
wherein the subject program code . is designed for execution 
on a subject architecture and is dynamically translated 
into target code for execution on a target architecture, 
said translator software further containing computer 
readable code executable by a computer to perform the 
following steps: - - 

generating .the intermediate -representation" to include 
architecture specific nodes which are specific to a 
particular combination of a subject architecture and a 
target architecture . 
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35. The computer readable storage medium of claim 34, 

said translator software further containing computer 
readable code executable by' a computer to perform the 
following steps: 

5 

initially representing all of the instructions in the 
subject code as subject architecture-specific nodes, where 
each subject architecture specific node corresponds to a 
respective instruction in the subject code; 

10 

determining whether an instruction in the subject code 
is one in which to provide a target architecture 
specialized conversion function, converting subject 
architecture specific nodes into target architecture 
15 specific nodes for those instructions determined to 
provide a target architecture specialized conversion 
function; and 

generating base nodes from the remaining subject 
20 architecture specific nodes which are not identified as 
providing a target architecture specialized code 
generation function. 

36. The computer readable storage medium of claim 35, 
25 said translator software further containing computer 

readable code executable by a computer to generate 
corresponding . target code from the target architecture 
specific nodes which is specialized for "the target 
architecture. 
-30 ~ - . " 1 " 

37. The computer readable storage medium of claim 34, 
said translator software further containing computer 
readable code executable by a computer to generate 
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corresponding target code from the base nodes which is not 
specialized for the target architecture. 

38. A translator apparatus for use in a target 

-5 computing environment having a processor and a memory 
coupled to the processor for translating subject program 
code appropriate in a subject computing environment to 
produce target program code appropriate to the target 
computing environment, the translator apparatus 
10 comprising: 

a decoding mechanism configured to decode instructions 
in the subject program code; 

15 an intermediate representation generating mechanism 

configured to generate an intermediate representation (IR) 
of the decoded program code to include at least one type 
of IR nodes out of a plurality of possible types of ' IR 
nodes; and 

20 

an intermediate representation (IR) type determining 
mechanism configured to determine which type of IR . nodes 
to generate in the intermediate representation for each 
respective instruction in the decoded program code, 
25 wherein the IR nodes in the intermediate representation 
(IR) are -abstract representations of the ~ expressions , 
calculations, . and operations . performed . by the program, 
"code. ' • _ * _ " ' 

3 0 39. - The translator -apparatus of claim 38, wherein the 
plurality of possible types of IR nodes include base nodes 
and complex nodes. 
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40. The translator apparatus of claim 39, wherein base 
nodes represent the most basic semantics of any subject 
architecture running the program code, such that the 
semantics of- base nodes cannot be decomposed into other 

5 nodes representing more simple semantics. 

41. The translator apparatus of claim 40, wherein base 
nodes are generic across a plurality of possible subject 
architectures. - 

10 

42. The translator apparatus of claim 40, wherein 
complex nodes provide a more compact representation of the 
semantics of complex instructions in the program code than 
that of base node representations. 

15 

-43. The translator' apparatus of claim 42, wherein 

complex nodes represent immediate type instructions in 
which a constant operand value is encoded into the 
immediate type instruction itself in an immediate field. 

20 

44. The translator apparatus of claim 42, wherein a 

complex node may be decomposed into a plurality of base 
nodes to represent the same semantics of an instruction in 
the decoded program code . 

25 

■~ 45. The translator apparatus -of claim 42, wherein the 

program code is^ designed to be "executed by a subject 
- architecture, the S. intermediate representation generating 
mechanism further comprising a complex node generating 
30 mechanism -for generating complex - nodes only for - those 
features correspondingly configurable on the subject 
architecture. 



46. The translator apparatus of claim 39, wherein the 
plurality of possible types of IK nodes further include 
polymorphic nodes. 

47. The translator apparatus of claim 46, wherein the 
program code is subject code designed for execution on a 
subject architecture and is dynamically translated into 
target code for execution on a target architecture, the 
intermediate representation generating mechanism further 
comprising : 

a polymorphic node generating mechanism for generating 
the intermediate representation to include polymorphic 
nodes, wherein ' polymorphic nodes, contain a function 
pointer to a function of the target architecture specific 
to a particular instruction in the subject code. 

48. The translator apparatus of claim 47, said 
polymorphic node generating mechanism generating 
polymorphic nodes when the features of the target 
architecture would cause the semantics of a particular 
subject instruction to-be lost if realized as base nodes. 

49. The translator apparatus of claim 47, wherein each 
polymorphic node is specific to a ' combination of a 
particular instruction in the subject code and a- function 
of the target architecture. 

50. The translator apparatus of claim 47, wherein said 
intermediate representation (IR) type" determining 
mechanism further comprises a polymorphic identification 
mechanism for identifying an instruction in subject code 
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which corresponds an instruction on a list of polymorphic 
instructions to be realized as polymorphic nodes; and 

when a subject instruction corresponds to an 
5 instruction on the list of polymorphic instructions, said 
intermediate representation generating mechanism generates 
polymorphic nodes only for those subject instructions 
corresponding to those' on the list of polymorphic 
instructions. 

10 

51. The translator apparatus of claim 38, wherein the 

plurality of possible types of 1R nodes further include 
base nodes and architecture specific nodes . 

15 52. The translator apparatus of claim 51, wherein the 

program code is subject code designed for execution on a 
subject architecture and is dynamically translated into 
target code for execution on a target architecture, said 
intermediate representation generating mechanism further 

20 comprising: 

an architecture specific node generating mechanism for 
generating the intermediate representation to include 
architecture specific nodes which are specific to a 
25 particular combination of a subject architecture and a 
target architecture .- 

53 . _ The.... translator apparatus of- claim 52, the 
intermediate representation generating mechanism being 
30- configured to: — *;. 



initially represent all of the instructions in the 
subject code as subject architecture-specific nodes, where 
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each subject architecture specific node corresponds to a 
respective instruction in the subject code; 

determine whether an instruction in the subject code 
5 is one in which to provide a target architecture 
specialized . conversion function, convert subject 
architecture specific nodes into target architecture 
specific nodes for those instructions . determined to 
provide a target architecture specialized conversion 
10 function; and 

generate base nodes from the remaining subject 
architecture specific nodes which are not identified as 
providing a target architecture specialized code 
15 generation function. 

54. The translator apparatus of claim 53, further 
comprising a specialized target code generating mechanism 
for generating corresponding target code from the target 

20 architecture specific nodes which is specialized for the 
target architecture. 

55. The translator apparatus of claim 52, further 
comprising a non specialized target code generating 

25 mechanism for generating corresponding target code from 
the base nodes which is not specialized for the target 
architecture. 

56. The translator apparatus of claim 47, wherein said 
-30 generated polymorphic nodes specify -the registers to be 

allocated during target code generation. 
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57. The translator apparatus of claim 47, wherein said 
generated polymorphic nodes are utilized in generic kernel 
optimizations by inferring information from the function 
pointer in the polymorphic node which may otherwise be 

5 indeterminable from the polymorphic node. 

58. The translator apparatus of claim 50, wherein when 
a subject instruction corresponds to an instruction on the . 
list of polymorphic instructions, said intermediate 

10 representation generating mechanism generates either 
polymorphic nodes or base nodes for those subject 
instructions corresponding to those on the list of 
polymorphic instructions. 

15 59. The method of claim 10, wherein said generated 

polymorphic nodes specify the registers to be allocated 
during target code generation. 

60. The method of claim 10, wherein said generated 

20 polymorphic nodes are utilized in generic kernel 
optimizations by inferring information from the function 
pointer in the polymorphic node which may otherwise be 
indeterminable from the polymorphic node. 

25 61. The method of claim 13, wherein when a subject 

instruction corresponds to an instruction on the list of 
polymorphic' instructions, said intermediate representation 
generating step .generates" either polymorphic- nodes ..or base 
nodes for those subject instructions corresponding to 

30 those on -the list of polymorphic instructions. 
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62. The computer-readable storage medium of claim 29, 

wherein said generated polymorphic nodes specify the 
registers to be allocated during target code generation. 

5 63. The computer-readable storage medium of claim 29, 

wherein said generated polymorphic nodes are utilized in 
generic kernel optimizations by inferring information from 
the function pointer in the polymorphic node which may 
otherwise be indeterminable from the polymorphic node. 

10 

64. - The translator apparatus of claim 32, wherein said 
computer readable' code executable by a computer for 
determining the type of IR nodes further, when a subject 
instruction corresponds to an instruction on the list of 
15 polymorphic instructions, generates either polymorphic 
nodes or base nodes for those subject instructions 
corresponding to those on the list of polymorphic 
instructions . 

20 65. A method of ■ translating subject program code 

capable of being executed on a subject- processor 
architecture to target program code capable of being 
executed on a target processing architecture using a 
translator configurable between a plurality of possible 

25 subject/target processing architecture pairings, said 
method comprising : 

selecting, a subject- processor architecture on which- 
the subject program code is designed to be executed from a 
30 -plurality of possible subject processor architectures; 
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selecting a target processor architecture on which the 
target program code is to be executed from a plurality of 
possible target processor architectures; and 

5 configuring a translator to translate the subject 

program code to target program code using a pairing of the 
selected subject processor architecture and the selected 
target processor architecture. 

10 66-? The method of claim 65, further comprising 

translating the subject program code to target program 
code dynamically at run-time while the target program code 
is being executed on the target processing architecture. 

15 67. The method of claim 65, further comprising: 

decoding instructions in the subject program code; 

determining which types of intermediate representation 
20 (IR) nodes out of a plurality of possible types of IR 
nodes to utilize in an intermediate representation of the 
decoded program code for each respective instruction in 
the decoded program code based upon the particular 
translator configuration being undertaken based on the 
25 pairing of the selected subject processor architecture and 
the selected target processor architecture; and 

generating an" intermediate representation -(IR) of the 
decoded program code to include at least one type of IR 
3 0 nodes- out of a plurality of possible types of IR nodes; 

wherein the IR nodes in the intermediate 
representation (IR) are abstract representations of the 



expressions, calculations, and operations performed by the 
program code. 

68. The method of claim 67, further comprising 
generating the intermediate representation (IR) to include 
a combination of generic conversion features and specific 
conversion features, wherein generic conversion features 
are capable of being implemented across a plurality of 
possible processor architectures while specific conversion 
features are capable of being implemented by a specific 
processor architecture. 

69. The method of claim 68, wherein the particular 
translator configuration being undertaken determines the 
respective combination of generic conversion features and 
specific conversion features utilized. 

70 • A computer readable storage medium having 

translator software resident thereon in the form of 
computer readable code executable by a computer for 
performing a method of translating subject program code 
capable of being * executed on a subject processor 
architecture to target program code capable of being 
executed on a target processing architecture using a 
translator configurable between a plurality of possible 
subject /target processing" architecture pairings, said 
method comprising: 

selecting a subject processor architecture on which 
the -subject program code was designed -to be executed from 
a plurality of possible subject processor architectures; 
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selecting a target processor architecture on which the 
target program code is to be executed from a plurality of 
possible, target processor architectures; and 

5 configuring a translator to translate the subject 

program code to target program code using a pairing of the 
selected subject processor architecture and the selected 
target processor architecture. 

10 71. The computer-readable storage medium of claim 70, - 

said translator software further containing computer 
readable code executable by a computer to translate the 
subject program code to target program code dynamically at 
run- time while the target program code is being executed 

15 on the target processing architecture. 

72. The computer-readable storage medium of claim 70, 

said translator software further containing computer 
readable code executable by a computer to perform the" 
20 following steps: 

decoding instructions in the subject program code; 

determining which types of intermediate representation 
25 (IR) nodes out of a plurality of possible types of IR 
nodes to utilize in an intermediate representation of the 
decoded program code for each respective instruction in 
- the decoded program _ code based upon the particular 
translator configuration being undertaken based on the 
30 pairing of the selected subject processor architecture and 
the selected target processor architecture; and 



generating an intermediate representation (IR) of the ' 
decoded program code to include at least one type of IR 
nodes out of a plurality of possible types of IR nodes; 

wherein the IR nodes in the intermediate 
representation (IR) are abstract representations of the 
expressions, calculations , . and operations performed by the 
program code . 

73. The computer-readable storage medium of claim 72, 
said translator software further containing computer 
readable code executable by a computer to generate the 
intermediate representation (IR) to include a combination 
of generic conversion features and specific conversion 
features, wherein generic conversion features are capable 
of being implemented across a plurality of possible 
processor architectures while specific conversion features 
are capable of being implemented by a specific processor 
architecture . 

74. The computer-readable storage medium of claim 73, 
wherein the particular translator configuration being 
undertaken determines the respective combination of 
generic conversion features and specific conversion 
features utilized. 

75. A translator apparatus for use in a target 
computing environment having" a processor and - a memory 
coupled to the processor for translating subject program 
code capable of being -executed on a subject - . processor ~ 
architecture to target program code capable of being 
executed on the target processor architecture of the 
target computing environment using a translator 
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configurable between a plurality of possible 
subject/target processing architecture > pairings, the 
translator apparatus comprising: 

5 a subject processor selecting mechanism configured to 

select a subject processor architecture on which the 
subject program code was designed to be executed from a 
plurality of possible subject processor architectures; 

10 a target processor selecting mechanism configured to 

select a target processor architecture on which the target 
program code is to be executed from a plurality of 
possible target processor architectures; and 

15 a configuration mechanism configured to configure a 

translator to translate the subject program. code to target 
program code using a pairing of the selected subject 
processor architecture and the selected target processor 
architecture. 

20 

76. The translator apparatus of, claim 75, further 
comprising a translation mechanism configured to translate 
the subject program code to target program code 
dynamically at run-time while the target program code is 

25 being executed on the target processing architecture. 

77, The translator apparatus of claim 75, further 
comprising: 

30 a decoding mechanism conf-igured -to decode instructions 

in the subject program code; 
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an intermediate representation (IR) type determining 
mechanism configured to determine which types of 
intermediate, representation (IR) nodes out of a plurality 
of possible types of IR nodes to utilize in an 
5 intermediate representation of the decoded program code 
for each respective instruction in. the decoded program 
code based upon the particular translator configuration 
being undertaken based on the pairing of the selected 
subject processor architecture and the selected target 
10 processor architecture; and 

an intermediate representation (IR) generating 
mechanism configured to generate an intermediate 
representation (IR) of the decoded program code to include 
15 at least one type of IR nodes out of a ' plurality of 
possible types of IR nodes; 

wherein the IR nodes in the intermediate 
representation (IR) are abstract representations of the 
20 expressions, calculations, and operations performed by the 
program code . 

78. The translator apparatus of claim 77, wherein the 

intermediate representation (IR) generating mechanism is 

25 further configured to generate the intermediate 
representation^ (IR) to include _ a combination of generic 
conversion features and specific conversion features, 
wherein generic conversion features are capable of being 
implemented across a plurality of possible processor 

30 architectures while specific conversion -features are 
capable of being implemented by a specific processor 
architecture . 
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79 The translator apparatus of claim 78, wherein the 

particular translator configuration being undertaken 
the respective combination of generic 
features and ' specific conversion features 



determines 
conversion 
5 utilized. 



80 A method of generating an intermediate 

representation of program code, substantially as 
hereinbefore described with reference to the accompanying 



10 drawings, 



81 . a computer readable storage medium having 

translator software resident thereon in the form of 
computer readable code executable by a computer, wherein 
15 the translator software is arranged to operate 
substantially as hereinbefore described with reference to 
the accompanying drawings. 

82 A translator apparatus for use in a target 

20 computing environment having a processor and a memory 
coupled to the processor for translating subject program 
code appropriate in 'a subject computing environment to 
produce target program code appropriate to the target 
computing environment, the translator apparatus 
25 substantially as hereinbefore described with reference to 
the accompanying drawings. 
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ABSTRACT 

IMPROVED ARCHITECTURE FOR GENERATING INTERMEDIATE 
5 REPRESENTATIONS FOR PROGRAM CODE CONVERSION 

An improved architecture for a program code conversion 
apparatus and method for generating intermediate 

10 representations for program code conversion. The program 
code conversion apparatus determines which types of IR 
nodes to generate in an intermediate representation of 
subject code to be translated. Depending upon the 
particular subject and target computing environments 

15 involved in the conversion, the program code conversion 
apparatus utilizes either base nodes, complex nodes, 
polymorphic nodes, and architecture specific nodes, or 
some combination thereof, in generating the intermediate 
representation . 
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